Material and idea Source: https://github.com/bupaverse
sc1_data <- read.csv("C:/Users/nikol/Downloads/scenario1.csv", sep=";")
head(sc1_data)
sc1_activitylog <- sc1_data %>%
# rename timestamp variables appropriately
dplyr::rename(start = activity_started,
complete = activity_ended) %>%
# convert timestamps to
convert_timestamps(columns = c("start", "complete"), format = ymd_hms) %>%
activitylog(case_id = "patient",
activity_id = "handling",
resource_id = "employee",
timestamps = c("start", "complete"))
#frequency: absolute, absolute_case, relative, relative_case
print(process_map(sc1_activitylog, type=frequency("absolute")))
NULL
#performance: median/mean, "years"/"semesters"/"quarters"/"months"/"weeks"/"days"/"hours"/"mins"/"secs"
print(process_map(sc1_activitylog, type=performance(median,"secs")))
NULL
resource_map(sc1_activitylog, type=frequency("absolute"))
resource_map(sc1_activitylog, type=performance(median,"secs"))
sc2_data <- read.csv("C:/Users/nikol/Downloads/scenario2.csv", sep=";")
head(sc2_data)
sc2_data %>%
# recode lifecycle variable appropriately
dplyr::mutate(registration_type = forcats::fct_recode(registration_type,
"start" = "started",
"complete" = "completed")) %>%
convert_timestamps(columns = "time", format = ymd_hms) %>%
eventlog(case_id = "patient",
activity_id = "handling",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time",
resource_id = "employee") %>%
to_activitylog() -> sc2_activitylog
sc2_activitylog
# Log of 10 events consisting of:
1 trace
1 case
5 instances of 5 activities
5 resources
Events occurred from 2018-09-20 17:16:02 until 2021-09-20 17:03:14
# Variables were mapped as follows:
Case identifier: patient
Activity identifier: handling
Resource identifier: employee
Timestamps: start, complete
#frequency: absolute, absolute_case, relative, relative_case
print(process_map(sc2_activitylog, type=frequency("absolute")))
NULL
#performance: median/mean, "years"/"semesters"/"quarters"/"months"/"weeks"/"days"/"hours"/"mins"/"secs"
print(process_map(sc2_activitylog, type=performance(median,"secs")))
NULL
resource_map(sc2_activitylog, type=frequency("absolute"))
resource_map(sc2_activitylog, type=performance(median,"secs"))
sc3_data <- read.csv("C:/Users/nikol/Downloads/scenario3.csv", sep=";")
head(sc3_data)
sc3_activitylog <- sc3_data %>%
# recode lifecycle variable appropriately
dplyr::mutate(registration_type = forcats::fct_recode(registration_type,
"start" = "started",
"complete" = "completed")) %>%
convert_timestamps(columns = "time", format = ymd_hms) %>%
eventlog(case_id = "patient",
activity_id = "handling",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time",
resource_id = "employee")
G2;H2;Warningh in validate_eventlog(eventlog) :
The following activity instances are connected to more than one resource: 125,625,1060,1297,1859,2354g
sc3_activitylog
# Log of 12 events consisting of:
1 trace
1 case
6 instances of 6 activities
6 resources
Events occurred from 2004-05-20 17:21:29 until 2012-05-20 17:21:59
# Variables were mapped as follows:
Case identifier: patient
Activity identifier: handling
Resource identifier: employee
Activity instance identifier: handling_id
Timestamp: time
Lifecycle transition: registration_type
#frequency: absolute, absolute_case, relative, relative_case
print(process_map(sc3_activitylog, type=frequency("absolute")))
NULL
#performance: median/mean, "years"/"semesters"/"quarters"/"months"/"weeks"/"days"/"hours"/"mins"/"secs"
print(process_map(sc3_activitylog, type=performance(median,"secs")))
NULL
resource_map(sc3_activitylog, type=frequency("absolute"))
resource_map(sc3_activitylog, type=performance(median,"secs"))
In the examples below, we will use a slightly filtered versions of the traffic_fines data set, which contains 95% of the cases that have the most frequent traces.
tmp <- traffic_fines %>%
filter_trace_frequency(percentage = 0.95)
In the examples below, I will use a slightly filtered version of the traffic_fines data set, which contains 95% of the cases with the most frequent traces.
tmp %>%
process_map(frequency("absolute"))
In relative terms, I found that Payment represents 14.51% of the total activity instances. Additionally, I observed that in 94.66% of the cases where it occurred, it marked the end of the case. In the remaining 5.34% of cases, it was followed by another Payment.
tmp %>%
process_map(frequency("relative"))
Below, I found that Payment occurred in 46.21% of the cases. In 2.6% of the cases, I observed that a Payment activity was followed by another Payment.
tmp %>%
process_map(frequency("relative-case"))
Finally, the relative-consequent map shows what happens before activities. With respect to Payment, one can see that it was preceded by:
Create Fine (73.15%)
Add Penalty (21.51%)
Payment (5.34%)
Payment itself represents 14.51% of all activity executions.
tmp %>%
process_map(frequency("relative-consequent"))
Instead of using frequencies, I can also use process maps to visualize the performance of the process, by using performance() to configure the map instead of frequency().
There are three different parameters specific to the performance() configuration that I can adjust: the aggregation function, the time units, and the flow time type.
patients %>%
process_map(performance())
The FUN argument specifies the aggregation function I want to apply on the processing time (e.g., min, max, mean, median, etc.). By default, the mean durations are shown, but I can adjust this to the maximum, for example.
patients %>%
process_map(performance(FUN = max))
G2;H2;Warnungh: There was 1 warning in `summarize()`.
ℹ In argument: `label = do.call(...)`.
ℹ In group 10: `ACTIVITY_CLASSIFIER_ = NA` and `from_id = NA`.
Caused by warning in `type()`:
! kein nicht-fehlendes Argument für max; gebe -Inf zurückg
G2;H2;Warnungh: There were 2 warnings in `summarize()`.
The first warning was:
ℹ In argument: `value = do.call(...)`.
ℹ In group 1: `ACTIVITY_CLASSIFIER_ = "ARTIFICIAL_END"`, `next_act = NA`, `from_id = 1`, `to_id = NA`.
Caused by warning in `type()`:
! kein nicht-fehlendes Argument für max; gebe -Inf zurück
ℹ Run ]8;;ide:run:dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;]8;; to see the 1 remaining warning.g
Any function that takes a numerical vector and returns a single value can be used. For example, if I want to show the 0.90 percentile, I can use that function.
p90 <- function(x, ...) {
quantile(x, probs = 0.9, ...)
}
patients %>%
process_map(performance(FUN = p90))
The units argument allows me to specify the time units I want to use.
For example in days:
patients %>%
process_map(performance(mean, "days"))
For example in hours:
patients %>%
process_map(performance(mean, "hours"))
I can differentiate the profile used for nodes and edges by using the type_nodes and type_edges attributes instead of the type argument. This way, I can combine information about frequencies, performance, or any other value in the same graph.
patients %>%
process_map(type_nodes = frequency("relative_case"),
type_edges = performance(mean))
I can add a second layer of information to both nodes and edges.
patients %>%
process_map(type = frequency("relative_case"),
sec = frequency("absolute"))
I can differentiate both primary and secondary layers between nodes and edges.
patients %>%
process_map(type_nodes = frequency("relative_case"),
type_edges = performance(units = "hours"),
sec_nodes = frequency("absolute"),
sec_edges = performance(FUN = max, units = "hours"))
G2;H2;Warnungh: There were 2 warnings in `summarize()`.
The first warning was:
ℹ In argument: `value = do.call(...)`.
ℹ In group 1: `ACTIVITY_CLASSIFIER_ = "ARTIFICIAL_END"`, `next_act = NA`, `from_id = 1`, `to_id = NA`.
Caused by warning in `type()`:
! kein nicht-fehlendes Argument für max; gebe -Inf zurück
ℹ Run ]8;;ide:run:dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;]8;; to see the 1 remaining warning.g
Both frequency() and performance() have the arguments color_scale and color_edges to customize the colors in the process map:
Configuring the colors can be useful for harmonizing the process map aesthetics when using different layers for nodes and edges.
patients %>%
process_map(type_nodes = frequency("relative_case", color_scale = "PuBu"),
type_edges = performance(mean, color_edges = "dodgerblue4"))
Here, I use the patients’ event log provided by the eventdataR package.
Here’s a basic animation with static color and token size:
animate_process(patients)
I can change the default token color, size, or image as follows:
animate_process(patients, mapping = token_aes(size = token_scale(12), shape = "rect"))
Different token color.
animate_process(patients, mapping = token_aes(color = token_scale("red")))
Some combinations of options as mentioned above.
animate_process(patients, mode = "relative", jitter = 10, legend = "color",
mapping = token_aes(color = token_scale("employee",
scale = "ordinal",
range = RColorBrewer::brewer.pal(7, "Paired"))))
traffic_fines %>%
process_matrix(frequency("absolute"))
The Absolute Process Matrix is a tool I use to visualize the frequency of transitions between activities in a process, where the matrix displays the count of occurrences for each pair of activities.
traffic_fines %>%
process_matrix(frequency("absolute")) %>%
plot()
traffic_fines %>%
process_matrix(frequency("relative-case"))
The Relative-case Process Matrix is a tool I use to visualize the transitions between activities in a process, where the matrix shows the frequency of each transition relative to the total number of cases, providing insights into the proportion of cases that follow specific activity paths.
traffic_fines %>%
process_matrix(frequency("relative-case")) %>%
plot()
traffic_fines %>%
process_matrix(frequency("relative-antecedent"))
The Relative-antecedent Process Matrix is a tool I use to visualize the frequency of transitions between activities in a process, where it shows how often each activity is preceded by specific antecedent activities, expressed relative to the total number of cases.
traffic_fines %>%
process_matrix(frequency("relative-antecedent")) %>%
plot()
traffic_fines %>%
process_matrix(frequency("relative-consequent"))
The Relative-consequent Process Matrix is a tool I use to visualize the frequency of transitions between activities in a process, where it shows how often each activity is followed by specific subsequent activities, expressed relative to the total number of cases.
traffic_fines %>%
process_matrix(frequency("relative-consequent")) %>%
plot()
traffic_fines %>%
process_matrix(performance(FUN = mean, units = "weeks"))
The Performance Process Matrix is a tool I use to visualize not only the frequency of activity transitions but also the performance metrics, such as the time taken for each transition, allowing me to analyze how efficiently different process paths are performed.
traffic_fines %>%
process_matrix(performance(FUN = mean, units = "weeks")) %>%
plot()
The various versions of the process matrices provide different perspectives on the transitions between activities in a process.
Absolute Process Matrix displays the raw frequency of transitions between activities, showing the total count of occurrences for each pair of activities, regardless of the number of cases.
Relative-case Process Matrix provides a normalized view by showing the frequency of transitions as a proportion of the total number of cases. This allows for better insight into the relative occurrence of each activity pair in the overall process.
Relative-antecedent Process Matrix focuses on the activities that precede others. It shows the frequency of transitions where each activity is preceded by specific antecedent activities, normalized by the total number of cases.
Relative-consequent Process Matrix looks at the activities that follow others. It shows the frequency of transitions where each activity is followed by specific consequent activities, again expressed relative to the total number of cases.
Performance Process Matrix combines frequency analysis with performance metrics, displaying how long each transition takes and highlighting the efficiency of different paths in the process.
In summary, while the Absolute Process Matrix gives raw counts, the relative matrices provide normalized insights into the relationships between activities, and the Performance Process Matrix adds a layer of analysis on the time or efficiency of transitions.
The Absolute Dotted Chart is a visualization tool I use to display the frequency of activity occurrences in a process, where each activity is represented by a dot, and the total count of each activity is shown as the number of dots, providing a clear view of activity distribution.
sepsis %>%
dotted_chart(x = "absolute")
The Absolute Dotted Chart with the sort = “end” option arranges the activities in the process based on their occurrence at the end of the cases, allowing me to focus on the final activities and their frequencies in the process flow.
sepsis %>%
dotted_chart(x = "absolute", sort = "end")
The Relative Dotted Chart visualizes the relative frequency of activity occurrences in a process, normalizing the data to show each activity’s proportion in relation to the total number of cases. Each dot represents an activity, and the number of dots reflects its relative frequency, helping to compare the prevalence of different activities in the process.
sepsis %>%
dotted_chart(x = "relative")
The relative_week Dotted Chart displays the relative frequency of activity occurrences week by week, allowing me to analyze how the prevalence of activities changes over time on a weekly basis with ggplot2 colors.
sepsis %>%
dotted_chart(x = "relative_week",
scale_color = ggplot2::scale_color_discrete)
The relative_day Dotted Chart shows the relative frequency of activity occurrences on a daily basis, helping me analyze how the distribution of activities varies from day to day.
sepsis %>%
dotted_chart(x = "relative_day")
The relative_week Dotted Chart displays the relative frequency of activity occurrences week by week, allowing me to analyze how the prevalence of activities changes over time on a weekly basis.
sepsis %>%
dotted_chart(x = "relative_week")
The trace_explorer() function allows me to explore individual case traces within a process, providing insights into the sequence of activities and helping to identify patterns or anomalies in specific case executions.
sepsis %>%
trace_explorer()
G2;H2;Warnungh: No `coverage` or `n_traces` set.
! Defaulting to `coverage` = 0.2 for `type` = "frequent" traces.g
The code sepsis %>% trace_explorer(coverage = 0.15)
allows me to explore the traces in the sepsis dataset, focusing on cases
where at least 15% of the activities are covered, helping to analyze
more representative process paths.
sepsis %>%
trace_explorer(coverage = 0.15)
The code sepsis %>% trace_explorer(n_traces = 10)
enables me to explore the first 10 traces in the sepsis dataset,
providing a closer look at a subset of the process flows for detailed
analysis.
sepsis %>%
trace_explorer(n_traces = 10)
The code
sepsis %>% trace_explorer(n_traces = 10, type = "infrequent")
allows me to explore the 10 least frequent traces in the sepsis dataset,
helping to analyze rare or unusual process paths.
sepsis %>%
trace_explorer(n_traces = 10, type = "infrequent")
The code
sepsis %>% trace_explorer(n_traces = 10, coverage_labels = c("cumulative", "relative"))
enables me to explore the first 10 traces in the sepsis dataset,
displaying both cumulative and relative coverage labels to better
understand the distribution of activity occurrences.
sepsis %>%
trace_explorer(n_traces = 10,
coverage_labels = c("cumulative", "relative"))
The code
sepsis %>% trace_explorer(n_traces = 10, label_size = 4)
allows me to explore the first 10 traces in the sepsis dataset while
adjusting the label size to 4, making the trace labels more readable for
better analysis.
sepsis %>%
trace_explorer(n_traces = 10, label_size = 4)
The code
sepsis %>% trace_explorer(n_traces = 10, show_labels = FALSE)
allows me to explore the first 10 traces in the sepsis dataset without
displaying the labels, providing a cleaner view of the trace
sequences.
sepsis %>%
trace_explorer(n_traces = 10,
show_labels = FALSE)
The code
sepsis %>% trace_explorer(n_traces = 10, abbreviate = FALSE)
allows me to explore the first 10 traces in the sepsis dataset without
abbreviating the activity labels, providing a full view of the trace
details.
sepsis %>%
trace_explorer(n_traces = 10, abbreviate = FALSE)
The code
sepsis %>% trace_explorer(n_traces = 10, scale_fill = ggplot2::scale_fill_discrete)
allows me to explore the first 10 traces in the sepsis dataset, applying
a discrete color scale from ggplot2 to better differentiate the activity
labels.
sepsis %>%
trace_explorer(n_traces = 10,
scale_fill = ggplot2::scale_fill_discrete)
The code traffic_fines %>% ps_detailed() generates a
detailed process summary of the traffic_fines dataset, providing an
in-depth view of the process flow and its characteristics.
traffic_fines %>%
ps_detailed()
The code
traffic_fines %>% ps_detailed(n_segments = 10) generates
a detailed process summary of the traffic_fines dataset, displaying
information for the first 10 segments of the process flow.
traffic_fines %>%
ps_detailed(n_segments = 10)
The code
traffic_fines %>% ps_detailed(classification = "resource")
generates a detailed process summary of the traffic_fines dataset,
focusing on the resource classification to provide insights into
resource usage within the process.
traffic_fines %>%
ps_detailed(classification = "resource")
The code
traffic_fines %>% end_activities("case") %>% augment(traffic_fines, prefix = "end") %>% ps_detailed(classification = "end_activity")
analyzes the end activities of the traffic_fines dataset, augments the
data with information on these activities, and generates a detailed
process summary focusing on the classification of end activities.
traffic_fines %>%
end_activities("case") %>%
augment(traffic_fines, prefix = "end") %>%
ps_detailed(classification = "end_activity")
The code
traffic_fines %>% end_activities("case") %>% augment(traffic_fines, prefix = "end") %>% group_by(end_activity) %>% ps_aggregated()
analyzes the end activities of the traffic_fines dataset, augments the
data with these activities, groups the data by end activity, and
generates an aggregated process summary.
traffic_fines %>%
end_activities("case") %>%
augment(traffic_fines, prefix = "end") %>%
group_by(end_activity) %>%
ps_aggregated()
The code patients %>% activity_presence() %>% plot
visualizes the percentage of cases in which each activity is present in
the patients dataset, providing an overview of activity presence without
requiring a level argument.
patients %>% activity_presence() %>%
plot
The code patients %>% activity_frequency("activity")
calculates the frequency of each activity in the patients dataset,
providing insights into how often different activities occur within the
process.
patients %>%
activity_frequency("activity")
The code
patients %>% start_activities("resource-activity")
identifies the starting activities in the patients dataset based on the
resource-activity classification, helping to analyze the initial steps
involving specific resources.
patients %>%
start_activities("resource-activity")
The code
patients %>% end_activities("resource-activity")
identifies the end activities in the patients dataset based on the
resource-activity classification, allowing me to analyze the final steps
involving specific resources.
patients %>%
end_activities("resource-activity")
The code
patients %>% trace_coverage("trace") %>% plot()
visualizes the coverage of different traces in the patients dataset,
providing insights into how well the traces are represented within the
data.
patients %>%
trace_coverage("trace") %>%
plot()
The code patients %>% trace_length("log") %>% plot
visualizes the distribution of trace lengths in the patients dataset,
helping to analyze the variability in the number of activities within
each trace.
patients %>%
trace_length("log") %>%
plot
The code
patients %>% idle_time("resource", units = "days")
calculates the idle time for each resource in the patients dataset,
measuring the time in days when resources were not in use.
patients %>%
idle_time("resource", units = "days")
The code
patients %>% idle_time("resource", units = "days") %>% plot()
visualizes the idle time of resources in the patients dataset, showing
how many days each resource was inactive.
patients %>%
idle_time("resource", units = "days") %>%
plot()
The code
patients %>% processing_time("activity") %>% plot
visualizes the processing time for each activity in the patients
dataset, providing insights into the duration of different activities
within the process.
patients %>%
processing_time("activity") %>%
plot
The code
patients %>% throughput_time("log") %>% plot()
visualizes the throughput time in the patients dataset, displaying the
time taken for cases to move through the entire process.
patients %>%
throughput_time("log") %>%
plot()
The code patients %>% resource_frequency("resource")
calculates the frequency of each resource in the patients dataset,
showing how often different resources are utilized throughout the
process.
patients %>%
resource_frequency("resource")
The code
patients %>% resource_involvement(level = "resource") %>% plot
visualizes the involvement of each resource in the patients dataset,
highlighting how frequently different resources participate in the
process.
It this example it shows that only r1 and r2 are involved in all cases, r6 and r7 are involved in most of the cases, while the others are only involved in half of the cases, more or less.
patients %>%
resource_involvement(level = "resource") %>% plot
The code
patients %>% resource_specialisation("resource")
analyzes the specialization of each resource in the patients dataset,
showing the specific activities or tasks that each resource is most
frequently involved in.
In the simple patients event log, each resource is performing exactly one activity, and is therefore 100% specialized.
patients %>%
resource_specialisation("resource")
The code patients %>% resource_map() generates a
resource map of the patients dataset, visually displaying how resources
are distributed and involved across different activities in the
process.
patients %>%
resource_map()
The code patients %>% resource_matrix() %>% plot()
generates and visualizes a resource matrix for the patients dataset,
showing the interactions between resources and activities within the
process.
patients %>%
resource_matrix() %>%
plot()